Load the necessary libraries
library(mgcv) #for GAMs
library(gratia) #for GAM plots
library(emmeans) #for marginal means etc
library(broom) #for tidy output
library(MuMIn) #for model selection and AICc
library(lubridate) #for processing dates
library(tidyverse) #for data wrangling
library(DHARMa) #for residuals diagnostics
library(performance) #for residual disagnostics
library(see) # to visualize residual diagnostics
library(patchwork) #for grids of plots
The Australian Institute of Marine Science (AIMS) have a long-term inshore marine water quality monitoring program in which water samples are collected and analysed from sites (reef.alias) across the GBR numerous times per year. The focus of this program is to report long-term condition and change in water quality parameters.
Although we do have latitude and longitudes, the nature of the spatial design predominantly reflects a series of transects that start near the mouth of a major river and extend northwards, yet mainly within the open coastal zone. As a result, this design is not well suited to any specific spatial analyses (since they are mainly one dimensional).
AIMS water quality monitoring
Format of aims.wq.csv data file
| LATITUDE | LONGITUDE | reef.alias | Water_Samples | Region | Subregion | Season | waterYear | DOC |
|---|---|---|---|---|---|---|---|---|
| -16.1 | 145. | Cape Trib… | AIMS | Wet T… | Barron D… | Dry | 2008 | 0.830 |
| -16.1 | 145. | Cape Trib… | AIMS | Wet T… | Barron D… | Wet | 2008 | 0.100 |
| -16.1 | 145. | Cape Trib… | AIMS | Wet T… | Barron D… | Dry | 2009 | 0.282 |
| -16.1 | 145. | Cape Trib… | AIMS | Wet T… | Barron D… | Wet | 2009 | 1.27 |
| -16.1 | 145. | Cape Trib… | AIMS | Wet T… | Barron D… | Dry | 2009 | 0.793 |
| -16.1 | 145. | Cape Trib… | AIMS | Wet T… | Barron D… | Dry | 2010 | 0.380 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| LATITUDE | - Latitudinal coordinate |
| LONGITUDE | - Longitudinal coordinate |
| reef.alias | - Internal AIMS reef name |
| Water_Samples | - Categorical label of who collected the data |
| Region | - The MMP region |
| Subregion | - The MMP subregion |
| Season | - A categorical listing of Wet or Dry |
| waterYear | - The water year (1st Oct - 30 Sept) to which the data are attached |
| Date | - The date the sample was collected |
| Mnth | - The month the sample was collected |
| DOC | - Nitrite and Nitrate |
wq = read_csv('../public/data/aims.wq1.csv', trim_ws=TRUE)
glimpse(wq)
## Rows: 1,560
## Columns: 11
## $ LATITUDE <dbl> -16.10842, -16.11152, -16.11160, -16.11164, -16.11167, -…
## $ LONGITUDE <dbl> 145.4827, 145.4833, 145.4833, 145.4806, 145.4817, 145.48…
## $ reef.alias <chr> "Cape Tribulation", "Cape Tribulation", "Cape Tribulatio…
## $ Water_Samples <chr> "AIMS", "AIMS", "AIMS", "AIMS", "AIMS", "AIMS", "AIMS", …
## $ Region <chr> "Wet Tropics", "Wet Tropics", "Wet Tropics", "Wet Tropic…
## $ Subregion <chr> "Barron Daintree", "Barron Daintree", "Barron Daintree",…
## $ Season <chr> "Dry", "Dry", "Wet", "Dry", "Wet", "Wet", "Wet", "Dry", …
## $ waterYear <dbl> 2020, 2008, 2014, 2014, 2019, 2021, 2010, 2010, 2014, 20…
## $ Date <date> 2019-10-21, 2007-10-14, 2014-02-05, 2014-09-24, 2019-02…
## $ Mnth <dbl> 10, 10, 2, 9, 2, 2, 3, 6, 7, 10, 2, 6, 10, 10, 10, 2, 11…
## $ DOC <dbl> 828.2585, 568.7456, 878.4685, 878.0750, 833.2632, 993.00…
We will start by defining each of the categorical variables as factors. We will also define the random effect (reef.alias) as a factor.
wq = wq %>% mutate(reef.alias=factor(reef.alias),
Region=factor(Region),
Subregion=factor(Subregion),
Season=factor(Season))
Model formula: \[ y_i \sim{} \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i =\beta_0 + f(Date_i) + f(Month_i) \]
where \(\beta_0\) is the y-intercept. \(f(Date)\) and \(f(Month)\) indicate the additive smoothing functions of the long-term temporal trends and the annual seasonal trends respectively.
If we begin with a quck scatterplot of DOC agains Date..
ggplot(wq, aes(y=DOC, x=Date)) + geom_point()
We know that these DOC values have been measured over time from a number of locations (reef.alias). Therefore, it would be good to facet the scatterplot conditional on the reef.alias.
ggplot(wq, aes(y=DOC, x=Date)) +
geom_point() +
facet_wrap(~reef.alias)
Conclusions:
ggplot(wq, aes(y=DOC, x=Date)) +
geom_point() +
geom_smooth() +
scale_y_continuous(trans=scales::pseudo_log_trans()) +
facet_wrap(~reef.alias, scales='free_y')
Conclusions:
Date data cannot be directly modelled, it must be converted into a
numeric. This can be performed with the decimal_date()
function within the lubridate package.
Although we generally want to scale any continuous predictors, it appears that doing so with Date objects has downstream issues - the models fit ok, but partial plots are displaced along the x-axis. So as an alternative either dont scale, or do so with a routine that does not automatically back-trasform (such as sjmisc::std or sjmisc::center).
wq=wq %>% mutate(Dt.num=decimal_date(Date))
#wq=wq %>% mutate(sDt=scale(Date))
When contemplating fitting a complex model, it is often good practice to build a model up, starting with something relatively simple. That way, if the model fails or yeilds strange outcomes, it is easier to diagnose the issues. In the spirit of this, we will begin by building up a temporal model for a single reef (Pandora), before moving on to generalise across multiple reefs in a mixed effects model.
wq.sub=wq %>% filter(reef.alias=='Pandora', !is.na(DOC))
#wq.sub=wq %>% filter(reef.alias=='Green', !is.na(DOC))
ggplot(wq.sub, aes(y=DOC, x=Date)) + geom_point() +
geom_smooth() +
scale_y_log10()
Conclusions:
mgcv package (or any frequentist package for fitting GAM’s
that I am aware of).wq.sub=wq %>% filter(reef.alias=='Pandora', !is.na(DOC))
Actually,it might be worth exploring a variety of models:
wq.gam1 <- gam(DOC ~ s(Dt.num), data=wq.sub, family='gaussian', method='REML')
wq.gam2 <- gam(DOC ~ s(Dt.num), data=wq.sub, family=gaussian(link='log'), method='REML')
wq.gam3 <- gam(DOC ~ s(Dt.num), data=wq.sub, family=Gamma(link='log'), method='REML')
wq.gam4 <- gam(DOC ~ s(Dt.num), data=wq.sub, family=tw(link='log'), method='REML')
library(MuMIn)
AICc(wq.gam1, wq.gam2, wq.gam3, wq.gam4)
Conclusions:
k.check(wq.gam1)
## k' edf k-index p-value
## s(Dt.num) 9 2.967039 0.7748984 0.04
appraise(wq.gam1)
#performance::check_model(wq.gam1)
Conclusions:
wq.resid <- simulateResiduals(wq.gam1, plot=TRUE)
testTemporalAutocorrelation(wq.resid, time=wq.gam1$model$Dt.num)
##
## Durbin-Watson test
##
## data: simulationOutput$scaledResiduals ~ 1
## DW = 1.3861, p-value = 0.01699
## alternative hypothesis: true autocorrelation is not 0
Conclusions:
k.check(wq.gam2)
## k' edf k-index p-value
## s(Dt.num) 9 6.67841 1.070237 0.6925
appraise(wq.gam2)
#performance::check_model(wq.gam2)
Conclusions:
wq.resid <- simulateResiduals(wq.gam2, plot=TRUE)
testTemporalAutocorrelation(wq.resid, time=wq.gam2$model$Dt.num)
##
## Durbin-Watson test
##
## data: simulationOutput$scaledResiduals ~ 1
## DW = 2.0322, p-value = 0.9027
## alternative hypothesis: true autocorrelation is not 0
Conclusions:
k.check(wq.gam3)
## k' edf k-index p-value
## s(Dt.num) 9 6.696727 1.072077 0.6825
appraise(wq.gam3)
#performance::check_model(wq.gam3)
Conclusions:
wq.resid <- simulateResiduals(wq.gam3, plot=TRUE)
testTemporalAutocorrelation(wq.resid, time=wq.gam3$model$Dt.num)
##
## Durbin-Watson test
##
## data: simulationOutput$scaledResiduals ~ 1
## DW = 2.0566, p-value = 0.8298
## alternative hypothesis: true autocorrelation is not 0
Conclusions:
k.check(wq.gam4)
## k' edf k-index p-value
## s(Dt.num) 9 6.697331 1.072102 0.6725
appraise(wq.gam4)
#performance::check_model(wq.gam4)
Conclusions:
wq.resid <- simulateResiduals(wq.gam4, plot=TRUE)
testTemporalAutocorrelation(wq.resid, time=wq.gam4$model$Dt.num)
##
## Durbin-Watson test
##
## data: simulationOutput$scaledResiduals ~ 1
## DW = 2.0507, p-value = 0.8476
## alternative hypothesis: true autocorrelation is not 0
Conclusions:
draw(wq.gam1)
plot(wq.gam1, pages=1, shift=coef(wq.gam1)[1], resid=TRUE, cex=4, scale=0)
draw(wq.gam2)
plot(wq.gam2, pages=1, shift=coef(wq.gam2)[1], resid=TRUE, trans=exp, cex=4, scale=0)
wq.gam3 %>% draw(residuals = TRUE)
plot(wq.gam3, pages=1, shift=coef(wq.gam3)[1], resid=TRUE, trans=exp, cex=4, scale=0)
wq.gam4 %>% draw(residuals = TRUE)
plot(wq.gam4, pages=1, shift=coef(wq.gam4)[1], resid=TRUE, trans=exp, cex=4, scale=0)
summary(wq.gam1)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## DOC ~ s(Dt.num)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 947.57 16.15 58.67 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(Dt.num) 2.967 3.698 12.1 1.63e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.445 Deviance explained = 47.4%
## -REML = 348.17 Scale est. = 14869 n = 57
Conclusions:
tidy(wq.gam1)
summary(wq.gam2)
##
## Family: gaussian
## Link function: log
##
## Formula:
## DOC ~ s(Dt.num)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.84632 0.01523 449.5 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(Dt.num) 6.678 7.794 8.661 4.3e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.58 Deviance explained = 63%
## -REML = 360.91 Scale est. = 11262 n = 57
Conclusions:
tidy(wq.gam2)
summary(wq.gam3)
##
## Family: Gamma
## Link function: log
##
## Formula:
## DOC ~ s(Dt.num)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.84418 0.01459 469.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(Dt.num) 6.697 7.815 11.24 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.572 Deviance explained = 66%
## -REML = 358.11 Scale est. = 0.012125 n = 57
Conclusions:
tidy(wq.gam3)
summary(wq.gam4)
##
## Family: Tweedie(p=1.99)
## Link function: log
##
## Formula:
## DOC ~ s(Dt.num)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.84419 0.01425 480.2 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(Dt.num) 6.697 7.816 11.76 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.572 Deviance explained = 66%
## -REML = 358.12 Scale est. = 0.012398 n = 57
Conclusions:
tidy(wq.gam4)
It is likely that DOC levels vary throughout a year. This might be related to river runoff etc. By not accounting for this, the unexplained variation is relatively high and the power associated with the long-term trends will be relatively low.
We have two options for accounting for these intra-annual fluctuations:
wq.gam5 <- gam(DOC ~ s(Dt.num, by=Season),
data=wq.sub,
family=Gamma(link='log'), method='REML')
k.check(wq.gam5)
## k' edf k-index p-value
## s(Dt.num):SeasonDry 9 2.312999 0.7446118 0.0375
## s(Dt.num):SeasonWet 9 2.407642 0.7446118 0.0275
appraise(wq.gam5)
#performance::check_model(wq.gam5)
Conclusions:
wq.resid <- simulateResiduals(wq.gam5, plot=TRUE)
testTemporalAutocorrelation(wq.resid, time=wq.gam5$model$Dt.num)
##
## Durbin-Watson test
##
## data: simulationOutput$scaledResiduals ~ 1
## DW = 1.363, p-value = 0.01309
## alternative hypothesis: true autocorrelation is not 0
Conclusions:
wq.gam5 %>% draw(residuals = TRUE)
plot(wq.gam5, pages=1, shift=coef(wq.gam5)[1], resid=TRUE, trans=exp, cex=4, scale=0)
Cyclical smoothers force the start and end of the trend to line up. As such, they assume that the bounds of the data represent the extremes of the cycle. The AIMS water quality marine monitoring program does not visit every site in every month. Importantly, sites are rarely visited in either December or January. Of course, December and January are the bookends of a year.
To ensure that the cyclical smoother extends from December to January, we need to define the extent of the month codes. When doing so, we need to assign a number of knots and this must match the knots argument in the smoother (in this case k=5). A k of 5 was chosen arbitrarily to allow some wiggliness without going crazy.
wq.gam6 <- gam(DOC ~ s(Dt.num)+
s(Mnth,bs='cc',k=5,fx=F),
knots=list(Mnth=seq(1,12,length=5)),
data=wq.sub,
family=Gamma(link='log'), method='REML')
k.check(wq.gam6)
## k' edf k-index p-value
## s(Dt.num) 9 7.685973 1.0110005 0.52
## s(Mnth) 3 2.151352 0.8861712 0.18
appraise(wq.gam6)
#performance::check_model(wq.gam6)
Conclusions:
wq.resid <- simulateResiduals(wq.gam6, plot=TRUE)
testTemporalAutocorrelation(wq.resid, time=wq.gam5$model$Dt.num)
##
## Durbin-Watson test
##
## data: simulationOutput$scaledResiduals ~ 1
## DW = 1.9659, p-value = 0.8972
## alternative hypothesis: true autocorrelation is not 0
Conclusions:
wq.gam6 %>% draw(residuals = TRUE)
plot(wq.gam6, pages=1, shift=coef(wq.gam5)[1], resid=TRUE, trans=exp, cex=4, scale=0)
AIC(wq.gam5, wq.gam6)
Conclusions:
summary(wq.gam6)
##
## Family: Gamma
## Link function: log
##
## Formula:
## DOC ~ s(Dt.num) + s(Mnth, bs = "cc", k = 5, fx = F)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.84218 0.01173 583.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(Dt.num) 7.686 8.558 16.49 < 2e-16 ***
## s(Mnth) 2.151 3.000 7.17 8.44e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.707 Deviance explained = 79.5%
## -REML = 351.55 Scale est. = 0.0078423 n = 57
Conclusions:
wq.list = with(wq.sub, list(Dt.num = seq(min(Dt.num), max(Dt.num), len=100)))
newdata = emmeans(wq.gam6, ~Dt.num, at=wq.list, type='response') %>% as.data.frame
head(newdata)
g1=ggplot(newdata, aes(y=response, x=date_decimal(Dt.num))) +
geom_ribbon(aes(ymin=lower.CL, ymax=upper.CL), fill='blue', alpha=0.3) +
geom_line() +
scale_x_datetime('') +
theme_bw()
wq.list = with(wq.sub, list(Mnth = seq(1, 12, len=100)))
newdata1 = emmeans(wq.gam6, ~Mnth, at=wq.list, type='response') %>% as.data.frame
head(newdata1)
g2=ggplot(newdata1, aes(y=response, x=Mnth)) +
geom_ribbon(aes(ymin=lower.CL, ymax=upper.CL), fill='blue', alpha=0.3) +
geom_line() +
theme_bw()
#library(gridExtra)
#grid.arrange(g1, g2, nrow=1)
g1 + g2
## Partial residuals
wq.presid = data.frame(Dt.num=wq.gam6$model$Dt.num, Mnth=mean(wq.gam6$model$Mnth)) %>%
mutate(Pred = predict(wq.gam6, newdata=., type='link'),
Resid = wq.gam6$residuals,
Presid = exp(Pred + Resid))
head(wq.presid)
ggplot(newdata, aes(y=response, x=date_decimal(Dt.num))) +
geom_ribbon(aes(ymin=lower.CL, ymax=upper.CL), fill='blue', alpha=0.3) +
geom_line() +
scale_x_datetime('') +
theme_bw() +
geom_point(data=wq.presid, aes(y=Presid)) +
geom_point(data=wq.sub, aes(y=DOC), color='red', alpha=0.5)
wq.presid=with(wq.gam6$model, data.frame(Dt.num=Dt.num, Mnth=mean(Mnth))) %>%
mutate(Resid=exp(as.vector(predict(wq.gam6, newdata=., type='link')) + wq.gam6$residuals))
ggplot(newdata, aes(y=response, x=date_decimal(Dt.num))) +
geom_ribbon(aes(ymin=lower.CL, ymax=upper.CL), fill='blue', alpha=0.3) +
geom_line() +
geom_point(data=wq.presid, aes(y=Resid)) +
scale_x_datetime('') +
theme_bw()
wq.sub <- wq %>%
group_by(reef.alias) %>%
mutate(Min=min(Dt.num)) %>%
ungroup() %>%
filter(Min<2012, Region != 'Fitzroy') %>%
droplevels
## reef=wq %>%
## group_by(reef.alias) %>%
## dplyr:::summarise(Min=min(Dt.num)) %>%
## filter(Min<2012) %>%
## pull(reef.alias)
## reef
## wq2=wq %>% filter(reef.alias %in% reef) %>% droplevels
ggplot(wq.sub, aes(y=DOC,x=Dt.num)) +
geom_point() +
facet_wrap(~reef.alias)
There are numerous ways of fitting mixed effects GAMs”
gamm() function and including a random
formula. This will run the glmmPQL function behind the
scenes. As such, it is not genuine likelihood and is therefore
limitedgamm4 function from the package with the
same name. This will run the glmer() behind the
scenes.gam() function and including a random
effects smootherggplot(wq, aes(y=DOC,x=Dt.num)) +
geom_point() +
facet_wrap(~reef.alias, scales='free_y')
## Some reefs dont have the full time series
wq.gamm1 <- gamm(DOC ~ s(Dt.num), random=list(reef.alias=~1),
data=wq, family=Gamma(link='log'), method='REML')
##
## Maximum number of PQL iterations: 20
k.check(wq.gamm1$gam)
## k' edf k-index p-value
## s(Dt.num) 9 8.677771 0.3232658 0
appraise(wq.gamm1$gam)
Conclusions:
#wq.resids <- simulateResiduals(wq.gamm1$gam, plot=TRUE)
Conclusions:
draw(wq.gamm1$gam)
plot(wq.gamm1$gam, pages=1, shift=fixef(wq.gamm1$lme)[1], resid=FALSE, trans=exp, cex=4, scale=0)
wq.gamm2 <- gamm4::gamm4(DOC ~ s(Dt.num), random=~(1|reef.alias),
data=wq, family=Gamma(link='log'), REML=TRUE)
k.check(wq.gamm2$gam)
## k' edf k-index p-value
## s(Dt.num) 9 8.081108 0.3093676 0
appraise(wq.gamm2$gam)
Conclusions:
#wq.resids <- simulateResiduals(wq.gamm2$gam, plot=TRUE)
Conclusions:
draw(wq.gamm2$gam)
plot(wq.gamm2$gam, pages=1, shift=fixef(wq.gamm2$mer)[1], resid=FALSE, trans=exp, cex=4, scale=0)
wq.gamm3 <- gam(DOC ~ s(Dt.num) + s(reef.alias, bs='re'),
data=wq, family=Gamma(link='log'), method='REML')
k.check(wq.gamm3)
## k' edf k-index p-value
## s(Dt.num) 9 8.713472 0.3856207 0
## s(reef.alias) 35 33.245650 NA NA
appraise(wq.gamm3)
Conclusions:
wq.resids <- simulateResiduals(wq.gamm3, plot=TRUE)
Conclusions:
draw(wq.gamm3)
plot(wq.gamm3, pages=1, shift=coef(wq.gamm3)[1], resid=FALSE, trans=exp, cex=4, scale=0)
wq.gamm3a <- gam(DOC ~ s(Dt.num, k=20) + s(reef.alias, bs='re'),
data=wq, family=Gamma(link='log'), method='REML')
k.check(wq.gamm3a)
## k' edf k-index p-value
## s(Dt.num) 19 14.92736 0.3935962 0
## s(reef.alias) 35 33.24476 NA NA
appraise(wq.gamm3a)
Conclusions:
wq.resids <- simulateResiduals(wq.gamm3a, plot=TRUE)
Conclusions:
draw(wq.gamm3a)
plot(wq.gamm3a, pages=1, shift=coef(wq.gamm3)[1], resid=FALSE, trans=exp, cex=4, scale=0)
wq.gamm3b <- gam(DOC ~ s(Dt.num, k=20)+
s(Mnth,bs='cc',k=5) +
s(reef.alias, bs='re'),
knots=list(Mnth=seq(1,12,length=5)),
family=Gamma(link='log'),
data=wq, method='REML')
k.check(wq.gamm3b)
## k' edf k-index p-value
## s(Dt.num) 19 14.709248 0.4267285 0
## s(Mnth) 3 2.771618 0.8070233 0
## s(reef.alias) 35 33.305842 NA NA
appraise(wq.gamm3b)
Conclusions:
the k-index and p-value (for both the long-term and seasonal trend) are low,
and the edf’s are approaching the respective k’ so it is likely that the number of knots is not overconstraining
the residual diagnostics are not too bad
wq.resids <- simulateResiduals(wq.gamm3b, plot=TRUE)
Conclusions:
draw(wq.gamm3b)
plot(wq.gamm3b, pages=1, shift=coef(wq.gamm3)[1], resid=FALSE, trans=exp, cex=4, scale=0)
## wq.gamm3c <- gam(DOC ~ s(Dt.num, k=20)+
## s(Mnth,bs='cc',k=5) +
## s(Region, bs='re') +
## s(reef.alias, bs='re'),
## knots=list(Mnth=seq(1,12,length=5)),
## family=Gamma(link='log'),
## data=wq, method='REML')
wq.gamm3c <- gam(DOC ~ s(Dt.num, by=Region, k=20)+
s(Mnth,bs='cc', by=Region, k=5) +
s(reef.alias, bs='re'),
knots=list(Mnth=seq(1,12,length=5)),
family=Gamma(link='log'),
data=wq, method='REML')
k.check(wq.gamm3c)
## k' edf k-index p-value
## s(Dt.num):RegionBurdekin 19 9.341666 0.4967757 0
## s(Dt.num):RegionFitzroy 19 14.188452 0.4967757 0
## s(Dt.num):RegionMackay Whitsunday 19 8.460311 0.4967757 0
## s(Dt.num):RegionWet Tropics 19 15.321056 0.4967757 0
## s(Mnth):RegionBurdekin 3 2.385042 0.8856331 0
## s(Mnth):RegionFitzroy 3 2.849394 0.8856331 0
## s(Mnth):RegionMackay Whitsunday 3 2.287301 0.8856331 0
## s(Mnth):RegionWet Tropics 3 2.829132 0.8856331 0
## s(reef.alias) 35 30.159614 NA NA
appraise(wq.gamm3c)
Conclusions:
wq.resids <- simulateResiduals(wq.gamm3c, plot=TRUE)
Conclusions:
draw(wq.gamm3c)
plot(wq.gamm3c, pages=1, shift=coef(wq.gamm3)[1], resid=FALSE, trans=exp, cex=4, scale=0)
Although the model that included smoothers conditional on Regions still had some outstanding issues with the DHARMa residuals (deviating from linear Q-Q and some non-linear pattern in the residuals), the model out-performs the other models and the violations are not too severe. It is highly likely that the model fits well and will yeild robust estimates.
summary(wq.gamm3c)
##
## Family: Gamma
## Link function: log
##
## Formula:
## DOC ~ s(Dt.num, by = Region, k = 20) + s(Mnth, bs = "cc", by = Region,
## k = 5) + s(reef.alias, bs = "re")
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.87392 0.01505 456.7 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(Dt.num):RegionBurdekin 9.342 11.46 12.230 <2e-16 ***
## s(Dt.num):RegionFitzroy 14.188 14.99 47.439 <2e-16 ***
## s(Dt.num):RegionMackay Whitsunday 8.460 10.42 13.666 <2e-16 ***
## s(Dt.num):RegionWet Tropics 15.321 17.43 24.404 <2e-16 ***
## s(Mnth):RegionBurdekin 2.385 3.00 14.751 <2e-16 ***
## s(Mnth):RegionFitzroy 2.849 3.00 23.561 <2e-16 ***
## s(Mnth):RegionMackay Whitsunday 2.287 3.00 10.565 <2e-16 ***
## s(Mnth):RegionWet Tropics 2.829 3.00 39.535 <2e-16 ***
## s(reef.alias) 30.160 34.00 8.863 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.634 Deviance explained = 74%
## -REML = 10085 Scale est. = 0.021646 n = 1552
Conclusions:
tidy(wq.gamm3c)
tidy(wq.gamm3c) %>% knitr::kable()
| term | edf | ref.df | statistic | p.value |
|---|---|---|---|---|
| s(Dt.num):RegionBurdekin | 9.341666 | 11.45699 | 12.230099 | 0 |
| s(Dt.num):RegionFitzroy | 14.188452 | 14.99328 | 47.439277 | 0 |
| s(Dt.num):RegionMackay Whitsunday | 8.460311 | 10.41882 | 13.665692 | 0 |
| s(Dt.num):RegionWet Tropics | 15.321056 | 17.42746 | 24.404134 | 0 |
| s(Mnth):RegionBurdekin | 2.385042 | 3.00000 | 14.751271 | 0 |
| s(Mnth):RegionFitzroy | 2.849394 | 3.00000 | 23.561077 | 0 |
| s(Mnth):RegionMackay Whitsunday | 2.287301 | 3.00000 | 10.565150 | 0 |
| s(Mnth):RegionWet Tropics | 2.829132 | 3.00000 | 39.534698 | 0 |
| s(reef.alias) | 30.159614 | 34.00000 | 8.862508 | 0 |
The partial plots within each Region suggested that DOC concentrations peaked around 2014 before declining. We might be interested in describing the extent of this decline (between 2014 and 2016).
Marginalising over Regions, this is:
emmeans(wq.gamm3c, ~Dt.num, at=list(Dt.num=c(2016, 2014)),
type='response') %>%
pairs() %>%
confint()
Conclusions:
Alternatively, we could exlore the same declines for each Region.
emmeans(wq.gamm3c, ~Dt.num|Region, at=list(Dt.num=c(2016, 2014)),
type='response') %>%
pairs() %>%
confint
Conclusions:
Perhaps we will start by marginalising over Regions
wq.list <- with(wq, list(Dt.num = modelr::seq_range(Dt.num, n=100),
Region=levels(Region)))
newdata <- wq.gamm3c %>%
emmeans(~Dt.num, at=wq.list,
type='response',
#data=wq.gamm3c$model
) %>%
as.data.frame
head(newdata)
ggplot(newdata, aes(y=response, x=date_decimal(Dt.num))) +
geom_ribbon(aes(ymin=lower.CL, ymax=upper.CL), fill='blue', alpha=0.3) +
geom_line() +
scale_x_datetime('') +
theme_bw()
## Partial residuals
wq.presid <- with(wq.gamm3c$model,
data.frame(Dt.num, Mnth)) %>%
mutate(Pred = predict(wq.gamm3c, exclude='s(reef.alias)', type='link'),
Resid = wq.gamm3c$resid,
Partial.obs = exp(Pred + Resid))
Resid=exp(
as.vector(predict(wq.gamm3c, exclude='s(reef.alias)', type='link')) +
wq.gamm3c$residuals
)
head(wq.presid)
ggplot(newdata, aes(y=response, x=date_decimal(Dt.num))) +
geom_ribbon(aes(ymin=lower.CL, ymax=upper.CL), fill='blue', alpha=0.3) +
geom_line() +
geom_point(data=wq.presid, aes(y=Partial.obs)) +
geom_point(data=wq, aes(y=DOC, x=date_decimal(Dt.num)), color='red') +
scale_x_datetime('') +
scale_y_continuous(breaks=seq(0,10,by=2), limits=c(0,10))+
theme_bw()
Now conditional on Regions.
wq.list = with(wq, list(Dt.num = seq(min(Dt.num), max(Dt.num), len=100),
Region=levels(Region)))
newdata = emmeans(wq.gamm3c, ~Dt.num|Region, at=wq.list, type='response',
data=wq.gamm3c$model) %>% as.data.frame
head(newdata)
ggplot(newdata, aes(y=response, x=date_decimal(Dt.num))) +
geom_ribbon(aes(ymin=lower.CL, ymax=upper.CL), fill='blue', alpha=0.3) +
geom_line() +
scale_x_datetime('') +
facet_wrap(~Region, nrow=1) +
theme_bw()